Bayesian Inference for Gaussian Semiparametric Multilevel Models

نویسندگان

  • Jason Bentley
  • Cathy Yuen
  • Yi Lee
چکیده

Bayesian inference for complex hierarchical models with smoothing splines is typically intractable, requiring approximate inference methods for use in practice. Markov Chain Monte Carlo (MCMC) is the standard method for generating samples from the posterior distribution. However, for large or complex models, MCMC can be computationally intensive, or even infeasible. Mean Field Variational Bayes (MFVB) is a fast deterministic alternative to MCMC. It provides an approximating distribution that has minimum Kullback-Leibler distance to the posterior. Unlike MCMC, MFVB efficiently scales to arbitrarily large and complex models. We derive MFVB algorithms for Gaussian semiparametric multilevel models and implement them in SAS/IML® software. To improve speed and memory efficiency, we use block decomposition to streamline the estimation of the large sparse covariance matrix. Through a series of simulations and real data examples, we demonstrate that the inference obtained from MFVB is comparable to that of PROC MCMC. We also provide practical demonstrations of how to estimate additional posterior quantities of interest from MFVB either directly or via Monte Carlo simulation. INTRODUCTION The growing emergence of machine learning and data mining tools has helped researchers capture and understand patterns from large and complex datasets that are typically of grouped or hierarchical structure. The most common data types are longitudinal and multilevel data, which are frequently seen in many applied areas such as education, epidemiology, medicine, population health and social science. These data structures give rise to correlations among observations within groups/clusters and therefore require sophisticated statistical models that take into account such correlations during data analysis. Linear mixed models extend standard linear models by adding normal random effects on the linear predictor scale to account for correlated observations within groups. However, this flexibility is traded with analytical tractability and may suffer from computational complexity and decreased efficiency. In the Bayesian paradigm, estimation of mixed models via Markov chain Monte Carlo (MCMC) techniques is challenging since the integral over the random effects is intractable. In this paper we use the SAS® Interactive Matrix Language (IML) environment to implement Mean Field Variational Bayes for Bayesian Gaussian semiparametric multilevel models. The IML environment while different in behavior to standard SAS procedures and the SAS Data Step, is well suited to coding new computational procedures in SAS from start to finish. This is because the IML environment operates in terms of vectors and matrices, and while this will be familiar to many users of other analytical software such as R or Matlab, the IML language is intuitive to use. Results of computation in IML can be easily output to SAS Datasets for use in existing analytic or graphics procedures, and so can be seamlessly integrated into analyses in SAS. This paper will introduce SAS users to a fast deterministic alternative to MCMC which provides an approximating distribution that has minimum Kullback-Leibler distance to the posterior of a Bayesian Gaussian semiparametric multilevel model, using the IML environment. Simulation comparisons will focus mainly on random intercept models with a single covariate that requires smoothing using penalized splines and a few typical covariates (continuous and binary). Some knowledge of using splines for smoothing non-linear associations between an outcome and a continuous covariate, particularly the random-effects formulation (penalized splines) is advantageous. Further, an understanding of the covariance structures that arise in multilevel models, Bayesian inference using MCMC, creating expanded design matrices when using categorical covariates, and the role of centering and standardizing data prior to an analysis will assist readers in getting the most out of this paper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Inference for Generalized Additive Regression based on Dynamic Models

We present a general approach for Bayesian inference via Markov chain Monte Carlo MCMC simulation in generalized additive semiparametric and mixed models It is particularly appropriate for discrete and other fundamentally non Gaussian responses where Gibbs sampling techniques developed for Gaussian models cannot be applied We use the close relation between nonparametric regression and dynamic o...

متن کامل

Approximate Bayesian Inference for Latent Gaussian Models Using Integrated Nested Laplace Approximations

Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalised) linear models, (generalised) additive models, smoothing-spline models, state-space models, semiparametric regression, spatial and spatio-temporal models, log-Gaussian Cox-processes, and geostatistical models. In this paper we consider app...

متن کامل

Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations

Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (generalized) additive models, smoothing spline models, state space models, semiparametric regression, spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider app...

متن کامل

Variational Gaussian Copula Inference

We utilize copulas to constitute a unified framework for constructing and optimizing variational proposals in hierarchical Bayesian models. For models with continuous and non-Gaussian hidden variables, we propose a semiparametric and automated variational Gaussian copula approach, in which the parametric Gaussian copula family is able to preserve multivariate posterior dependence, and the nonpa...

متن کامل

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET Approximate Bayesian Inference for Latent Gaussian Models Using Integrated Nested Laplace Approximations

We are concerned with Bayesian inference for latent Gaussian models, that is models involving a Gaussian latent field (in a broad sense), controlled by few parameters. This is perhaps the class of models most commonly encountered in applications: the latent Gaussian field can represent, for instance, a mix of smoothing splines or smooth curves, temporal and spatial processes. Hence, popular smo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016